Search results for "Cycles per instruction"

showing 5 items of 5 documents

Multi-objective optimisations for a superscalar architecture with selective value prediction

2012

This work extends an earlier manual design space ex ploration of our developed Selective Load Value Pre diction based superscalar architecture to the L2 unified cache. A fter that we perform an automatic design space expl oration using a special developed software tool by varying several architectural parameters. Our goal is to find optim al configurations in terms of CPI (Cycles per Instruction) and energy consumption. By varying 19 architectural parameter s, as we proposed, the design space is over 2.5 millions of billions configurations which obviously means that only heuristic search can be considered. Therefore, we propose dif ferent methods of automatic design space exploratio n based…

Hardware and ArchitectureComputer scienceCycles per instructionSuperscalarValue (computer science)Parallel computingCacheEnergy consumptionElectrical and Electronic EngineeringDesign spaceSoftwareSpace explorationSign (mathematics)IET Computers & Digital Techniques
researchProduct

Versatile Direct and Transpose Matrix Multiplication with Chained Operations: An Optimized Architecture Using Circulant Matrices

2016

With growing demands in real-time control, classification or prediction, algorithms become more complex while low power and small size devices are required. Matrix multiplication (direct or transpose) is common for such computation algorithms. In numerous algorithms, it is also required to perform matrix multiplication repeatedly, where the result of a multiplication is further multiplied again. This work describes a versatile computation procedure and architecture: one of the matrices is stored in internal memory in its circulant form, then, a sequence of direct or transpose multiplications can be performed without timing penalty. The architecture proposes a RAM-ALU block for each matrix c…

Cycles per instructionBlock matrix020206 networking & telecommunications02 engineering and technologyParallel computingMatrix chain multiplicationMatrix multiplication020202 computer hardware & architectureTheoretical Computer ScienceMatrix (mathematics)Computational Theory and MathematicsHardware and ArchitectureTranspose0202 electrical engineering electronic engineering information engineeringMultiplicationHardware_ARITHMETICANDLOGICSTRUCTURESArithmeticCirculant matrixSoftwareMathematicsIEEE Transactions on Computers
researchProduct

A 16 channel high resolution (<11 ps RMS) Time-to-Digital Converter in a Field Programmable Gate Array

2012

A 16-channel Time-to-Digital Converter (TDC) was implemented in a general purpose Field-Programmable Gate Array (FPGA). The fine time calculations are achieved by using the dedicated carry-chain lines. The coarse counter defines the coarse time stamp. In order to overcome the negative effects of temperature and power supply dependency bin-by-bin calibration is applied. The time interval measurements are done using 2 channels. The time resolution of channels are calculated for 1 clock cycle and a minimum of 10.3 ps RMS on two channels, yielding 7.3 ps RMS (10.3 ps/√2) on a single channel is achieved.

PhysicsCycles per instructionbusiness.industryElectrical engineeringPower (physics)Time-to-digital converterOpticsGate arrayCalibrationTimestampbusinessField-programmable gate arrayInstrumentationMathematical PhysicsCommunication channelJournal of Instrumentation
researchProduct

Improving Computing Systems Automatic Multiobjective Optimization Through Meta-Optimization

2016

This paper presents the extension of framework for automatic design space exploration (FADSE) tool using a meta-optimization approach, which is used to improve the performance of design space exploration algorithms, by driving two different multiobjective meta-heuristics concurrently. More precisely, we selected two genetic multiobjective algorithms: 1) non-dominated sorting genetic algorithm-II and 2) strength Pareto evolutionary algorithm 2, that work together in order to improve both the solutions’ quality and the convergence speed. With the proposed improvements, we ran FADSE in order to optimize the hardware parameters’ values of the grid ALU processor (GAP) micro-architecture from a b…

Mathematical optimizationMeta-optimizationComputer scienceCycles per instructionDesign space explorationPareto principleSortingEvolutionary algorithm02 engineering and technologyComputer Graphics and Computer-Aided DesignMulti-objective optimization020202 computer hardware & architecture0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingAlgorithm designElectrical and Electronic EngineeringSoftwareIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
researchProduct

Synthesizing on a reconfigurable chip an autonomous robot image processing system

2003

This paper deals with the implementation, in a high density reconfigurable device, of an entire log-polar image processing system. The log-polar vision reduces the amount of data to be stored and processed, simplifying several vision algorithms and making it possible the implementation of a complete processing system on a single chip. This image processing system is specially appropriated for autonomous robotic navigation, since these platforms have typically power consumption, size and weight restrictions. Furthermore, the image processing algorithms involved are time consuming and many times they have also real-time restrictions. A reconfigurable approach on a single chip combines hardwar…

Cycles per instructionComputer sciencebusiness.industryEmbedded systemDigital image processingControl reconfigurationSystem on a chipImage processingAutonomous robotChipbusinessPipeline (software)
researchProduct